NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes

https://doi.org/10.1109/CVPR52733.2024.00691

Shrivastava, Gaurav; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning

https://doi.org/10.1109/CVPR52733.2024.01371

Saini, Nirat; Pham, Khoi; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
ARDuP: Active Region Video Diffusion for Universal Policies

https://doi.org/10.1109/IROS58592.2024.10802264

Huang, Shuaiyi; Levy, Mara; Jiang, Zhenyu; Anandkumar, Anima; Zhu, Yuke; Fan, Linxi; Huang, De-An; Shrivastava, Abhinav (October 2024, IEEE)

Full Text Available
WayEx: Waypoint Exploration using a Single Demonstration

https://doi.org/10.1109/ICRA57147.2024.10611088

Levy, Mara; Saini, Nirat; Shrivastava, Abhinav (May 2024, IEEE)

Full Text Available
Composing Object Relations and Attributes for Image-Text Matching

https://doi.org/10.1109/CVPR52733.2024.01361

Pham, Khoi; Huynh, Chuong; Lim, Ser-Nam; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
Video Decomposition Prior: Editing Videos Layer By Layer

Shrivastava, Gaurav; Lim, Ser-Nam; Shrivastava, Abhinav (May 2024, The Twelfth International Conference on Learning Representations (ICLR))

Full Text Available
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing Their Contributions

https://doi.org/10.1109/CVPR52733.2024.01042

Padmanabhan, Namitha; Gwilliam, Matthew; Kumar, Pulkit; Maiya, Shishira R; Ehrlich, Max; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

https://doi.org/10.1109/CVPR52733.2024.01282

He, Bo; Li, Hengduo; Jang, Young Kyun; Jia, Menglin; Cao, Xuefei; Shah, Ashish; Lim, Ser-Nam; Shrivastava, Abhinav (June 2024, IEEE)

Full Text Available
ASIC: Aligning Sparse in-the-wild Image Collections

https://doi.org/10.1109/ICCV51070.2023.00382

Gupta, Kamal; Jampani, Varun; Esteves, Carlos; Shrivastava, Abhinav; Makadia, Ameesh; Snavely, Noah; Kar, Abhishek (October 2023, Proceedings of ICCV)

We present a method for joint alignment of sparse in-the-wild image collections of an object category. Most prior works assume either ground-truth keypoint annotations or a large dataset of images of a single object category. However, neither of the above assumptions hold true for the long-tail of the objects present in the world. We present a self-supervised technique that directly optimizes on a sparse collection of images of a particular object/object category to obtain consistent dense correspondences across the collection. We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches and make them dense and accurate matches by optimizing a neural network that jointly maps the image collection into a learned canonical grid. Experiments on CUB, SPair-71k and PF-Willow benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences across the image collection when compared to existing self-supervised methods. Code and other material will be made available at https://kampta.github.io/asic.
more » « less
Full Text Available
Chop & Learn: Recognizing and Generating Object-State Compositions

https://doi.org/10.1109/ICCV51070.2023.01852

Saini, Nirat; Wang, Hanyu; Swaminathan, Archana; Jayasundara, Vinoj; He, Bo; Gupta, Kamal; Shrivastava, Abhinav (October 2023, Proceedings of ICCV)

Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions. In this paper, we study the task of cutting objects in different styles and the resulting object state changes. We propose a new benchmark suite Chop & Learn, to accommodate the needs of learning objects and different cut styles using multiple viewpoints. We also propose a new task of Compositional Image Generation, which can transfer learned cut styles to different objects, by generating novel object-state images. Moreover, we also use the videos for Compositional Action Recognition, and show valuable uses of this dataset for multiple video tasks. Project website: https://chopnlearn.github.io.
more » « less
Full Text Available

« Prev Next »

Search for: All records